The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch. The impact of changing the model aspect ratio on Transformers is then studied systematically. This ratio balances the number of layers and the number of attention heads per layer while keeping the total number of attention heads and all other hyperparameters constant. On average, across 4 NLP tasks and 10 attention types, single layer wide models perform 0.3% better than their deep counterparts. We show an in-depth evaluation and demonstrate how wide models require a far smaller memory footprint and can run faster on commodity hardware, in addition, these wider models are also more interpretable. For example, a single layer Transformer on the IMDb byte level text classification has 3.1x faster inference latency on a CPU than its equally accurate deeper counterpart, and is half the size. We therefore put forward wider and shallower models as a viable and desirable alternative for small models on NLP tasks, and as an important area of research for domains beyond this.
translated by 谷歌翻译
神经网络容易受到对抗性示例的影响,可导致模型失败。对抗训练是阻止对抗例子的解决方案之一。模型在训练过程中受到攻击,并学会对其进行弹性。然而,这样的过程目前很昂贵 - 它需要很长时间才能用对抗样本生产和训练模型,而且更糟糕的是,偶尔会失败。在本文中,我们证明了通过数据子采样来提高对抗性训练效率的数据修剪方法。我们从经验上表明,数据修剪会改善对抗性训练的收敛性和可靠性,尽管具有不同水平的公用事业降级。例如,我们观察到,使用CIFAR10的随机子采样删除40%的数据,我们对最强大的攻击者失去了8%的对抗精度,而仅使用20%的数据,我们就会损失14%的对抗精度,并减少运行时的运行时间。第3个因素。有趣的是,我们发现在某些环境中,数据修剪带来了两个世界的好处 - 既可以提高对抗的准确性和训练时间。
translated by 谷歌翻译
机器学习容易受到对抗操作的影响。先前的文献表明,在训练阶段,攻击者可以操纵数据和数据采样程序以控制模型行为。一个共同的攻击目标是种植后门,即迫使受害者模型学会识别只有对手知道的触发因素。在本文中,我们引入了一类新的后门攻击类,这些攻击隐藏在模型体系结构内,即在用于训练的功能的电感偏置中。这些后门很容易实现,例如,通过为其他人将在不知不觉中重复使用的后式模型体系结构发布开源代码。我们证明,模型架构后门代表了一个真正的威胁,与其他方法不同,可以从头开始进行完整的重新训练。我们将建筑后门背后的主要构建原理(例如输入和输出之间的链接)形式化,并描述对它们的一些可能的保护。我们评估了对不同尺度的计算机视觉基准测试的攻击,并证明在各种培训环境中,潜在的脆弱性无处不在。
translated by 谷歌翻译
人行道挑战的数据科学(DSPC)旨在通过提供一个基准的数据集和代码来加速自动化视觉系统,以进行路面状况监测和评估,以创新和开发机器学习算法,这些算法已准备就绪,可以准备好练习。行业使用。比赛的第一版吸引了来自8个国家的22支球队。要求参与者自动检测和分类从多个来源捕获的图像中存在的不同类型的路面遇险,并且在不同的条件下。竞争是以数据为中心的:通过利用各种数据修改方法(例如清洁,标签和增强),团队的任务是提高预定义模型体系结构的准确性。开发了一个实时的在线评估系统,以根据F1分数对团队进行排名。排行榜的结果显示了机器在路面监控和评估中提高自动化的希望和挑战。本文总结了前5个团队的解决方案。这些团队提出了数据清洁,注释,增强和检测参数调整领域的创新。排名最高的团队的F1得分约为0.9。本文以对当前挑战效果很好的不同实验的综述以及对模型准确性的任何显着提高的审查进行了综述。
translated by 谷歌翻译
联合学习(FL)是一种强大的技术,用于以隐私保留方式从来自多个客户端的数据训练服务器上的模型。在FL中,服务器将模型发送到每个客户端,然后在本地培训模型并将其发送回服务器。服务器聚合更新的模型,并重复几轮的过程。 FL突出了显着的通信成本,特别是在将更新的本地模型从客户端发送回服务器时。最近提出的算法量化了模型参数,以有效地压缩流动。这些算法通常具有控制压缩因子的量化水平。我们发现量化水平的动态调整可以促进压缩而不会牺牲模型质量。首先,我们介绍了一种时间自适应量化算法,其随着训练的进展而增加量化级别。其次,我们介绍了一种客户自适应量化算法,该算法在每一轮中分配每个单独的客户端最佳量化级别。最后,我们将这两种算法与双自适应量化算法相结合。我们的实验表明,DadaQuant一贯改善客户$ \ lightarrow $服务器压缩,优于最强的非自适应基线,最高可达2.8美元。
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP (age 17), and up to 40% of the time for Stable Diffusion (ages 14 and 18); the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.
translated by 谷歌翻译
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
translated by 谷歌翻译
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics
translated by 谷歌翻译
Common to all different kinds of recurrent neural networks (RNNs) is the intention to model relations between data points through time. When there is no immediate relationship between subsequent data points (like when the data points are generated at random, e.g.), we show that RNNs are still able to remember a few data points back into the sequence by memorizing them by heart using standard backpropagation. However, we also show that for classical RNNs, LSTM and GRU networks the distance of data points between recurrent calls that can be reproduced this way is highly limited (compared to even a loose connection between data points) and subject to various constraints imposed by the type and size of the RNN in question. This implies the existence of a hard limit (way below the information-theoretic one) for the distance between related data points within which RNNs are still able to recognize said relation.
translated by 谷歌翻译